27 - Pattern Recognition [PR] - PR 23 [ID:23526]
50 von 125 angezeigt

Welcome back to Pattern Recognition. So today we want to start talking about support vector

machines. Support vector machines are a very powerful classification technique that is

built on top of convex optimization. So you will see that it essentially builds on top

of everything that we've seen so far in this class and we will see that we can solve some

of the problems much more elegantly using the tricks that we will show you in the next

couple of videos.

Okay so today support vector machines and what is our motivation? Well again we want

to look into linear decision boundaries and we assume that we have two linearly separable

classes and then we want to compute a linear decision boundary that allows the separation

of the training data and that generalizes very well. This builds on some observations

by Wapnick and already in 1996 he could show that the optimal separating hyperplane separates

two classes and maximizes the distance to the closest point from either class and this then

results in a unique solution for the hyperplanes and in most cases better generalization. So

let's look into the type of problem that we want to solve. Here we have linearly separable

and non-separable cases. On the left hand side you can see we have the green and the

blue dots and they can be separated with just a single line so this is a separable case.

On the right hand side you see a non-separable case so you see that there are only two points

that have been switched and with these two points the problem suddenly becomes no longer

solvable with a single hyperplane that would separate the two clusters. So if you remember

the perceptron in the right hand case it would essentially iterate until infinity. So let's

stick first with the separable case on the left hand side. Now the problem here is you

see that there is quite some space between the two point clouds and this means that there

are quite a few different solutions that you can find and therefore we don't have a unique

solution just by aiming at separating the two classes. So one thing that we could do

in this case is for example to average the perceptron solutions. Now the idea that was

introduced by Vapnik is actually that you want to maximize the distance between the

points of the two classes and here we actually show a solution that is showing a hyperplane

so here you see this line with the normal vector alpha and this one is essentially maximizing

the distance between the two point sets. So on the left hand side we have the hard margin

problem and the hard margin problem we really require the two sets to be absolutely separable

and then there is the soft margin problem and then the soft margin problem we will use

a couple of more tricks to actually also allow then a misclassification. So here you can

see that there is one point that cannot be separated in the second point cloud and here

we can then introduce tricks in order to still be able to compute a unique decision boundary

and this is then called the soft margin problem because we allow some confusions that are

caused by the optimally separating hyperplane. Let's have a look at a bit of linear algebra.

We assume that we have an affine function that defines the decision boundary so we can

write this up simply as the normal vector alpha transpose x plus some bias alpha zero

and this then means that for any point x on the hyperplane we essentially have alpha of

x equals to zero. So if you're on the plane then you have this inner product plus alpha

zero and you were exactly equal to zero. Also that if you have two points that are on the

hyperplane then the following conditions have to hold. So if you have x one minus x two

and multiply them with alpha transpose then also they need to be zero. So this is also

a necessary condition for all points x one x two that are on the hyperplane and also

note that the actual normal vector of the hyperplane needs to be normalized with the

length. So alpha is of course a vector that points in the normal vector direction but

actually the normal vector you would have to divide by the length of alpha so here we

take the two norm. Note that this is very important in particular if you're considering

signed distances. So if you really want to compute signed distances to the hyperplane

then actually what you need to do is you have to scale with the normal vector to compute

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:14:48 Min

Aufnahmedatum

2020-11-11

Hochgeladen am

2020-11-11 14:37:41

Sprache

en-US

In this video, we explain the basic concept of the support vector machine.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen